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Markov  Kernels  and  the  Conditional  Extreme  Value  Model 

ABSTRACT 

The  classical  approach  to  extreme  value  modelling  for  multivariate 
data  is  to  assume  that  the  joint  distribution  belongs  to  a 
multivariate  domain  of  attraction.  In  particular,  this  requires  that 
each  marginal  distribution  be  individually  attracted  to  a  univariate 
extreme  value  distribution.  The  domain  of  attraction  condition  may  be 
phrased  conveniently  in  terms  of  regular  variation  of  the  joint 
distribution  on  an  appropriate  cone.  A  more  flexible  model  for  data 
realizations  of  a  random  vector  was  proposed  by  Heffernan  and  Tawn 
[45],  under  which  not  all  the  components  are  required  to  belong  to  an 
extremal  domain  of  attraction.  Such  a  model  accomodates  varying 
degrees  of  asymptotic  dependence  between  pairs  of  components.  Instead 
of  starting  from  the  joint  distribution,  assume 
the  existence  of  an  asymptotic  approximation  to  the  conditional 
distribution  of  the  random  vector  given  one  of  the  components 
becomes  extreme.  Combined  with  the  knowledge  that  the 
conditioning  component  belongs  to  a  univariate  domain  of  attraction, 
this  leads  to  an  approximation 
the  probability  of  certain  risk  regions. 

When  originally  proposed, 

the  focus  was  on  conditional  distributions.  This  approach  presents  technical 
difficulties  regarding  the  choice  of  version  but  makes  sense  when 
dealing  with  Markov  kernels.  We  place  this  approach  in  the  more 
general  approach  using  vague  convergence  of  measures  and  multivariate 
regular  variation  on  cones. 
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Abstract.  The  classical  approach  to  extreme  value  modelling  for  multivariate  data  is  to  assume 
that  the  joint  distribution  belongs  to  a  multivariate  domain  of  attraction.  In  particular,  this  requires 
that  each  marginal  distribution  be  individually  attracted  to  a  univariate  extreme  value  distribution. 
The  domain  of  attraction  condition  may  be  phrased  conveniently  in  terms  of  regular  variation  of 
the  joint  distribution  on  an  appropriate  cone.  A  more  flexible  model  for  data  realizations  of  a 
random  vector  was  proposed  by  Heffernan  and  Tawn  [45],  under  which  not  all  the  components 
are  required  to  belong  to  an  extremal  domain  of  attraction.  Such  a  model  accomodates  varying 
degrees  of  asymptotic  dependence  between  pairs  of  components.  Instead  of  starting  from  the  joint 
distribution,  assume  the  existence  of  an  asymptotic  approximation  to  the  conditional  distribution  of 
the  random  vector  given  one  of  the  components  becomes  extreme.  Combined  with  the  knowledge 
that  the  conditioning  component  belongs  to  a  univariate  domain  of  attraction,  this  leads  to  an 
approximation  the  probability  of  certain  risk  regions. 

When  originally  proposed,  the  focus  was  on  conditional  distributions.  This  approach  presents 
technical  difficulties  regarding  the  choice  of  version  but  makes  sense  when  dealing  with  Markov 
kernels.  We  place  this  approach  in  the  more  general  approach  using  vague  convergence  of  measures 
and  multivariate  regular  variation  on  cones. 


1.  Overview 

The  classical  approach  to  extreme  value  modelling  for  multivariate  data  is  to  assume  that  the 
joint  distribution  belongs  to  a  multivariate  domain  of  attraction.  In  particular,  this  requires  that 
each  marginal  distribution  be  individually  attracted  to  a  univariate  extreme  value  distribution. 
The  domain  of  attraction  condition  may  be  phrased  conveniently  in  terms  of  regular  variation  of 
the  joint  distribution  on  an  appropriate  cone;  see  Das  and  Resnick  [4,  Proposition  4.1]. 

A  more  flexible  model  for  data  realizations  of  a  random  vector  was  proposed  by  Heffernan  and 
Tawn  [10],  under  which  not  all  the  components  are  required  to  belong  to  an  extremal  domain  of 
attraction.  Such  a  model  accomodates  varying  degrees  of  asymptotic  dependence  between  pairs 
of  components.  Instead  of  starting  from  the  joint  distribution,  Heffernan  and  Tawn  assumed  the 
existence  of  an  asymptotic  approximation  to  the  conditional  distribution  of  the  random  vector 
given  one  of  the  components,  as  that  component  becomes  extreme.  Combined  with  the  knowledge 
that  the  conditioning  component  belongs  to  a  univariate  domain  of  attraction,  this  leads  to  an 
approximation  for  the  joint  distribution,  given  that  one  component  is  extreme  (e.g.,  exceeds  some 
high  threshold).  However,  the  focus  on  conditional  distributions  presents  some  technical  difficulties 
regarding  the  choice  of  version. 

This  approach  was  subsequently  formalized  as  the  Conditional  Extreme  Value  Model  (CEVM) 
by  Heffernan  and  Resnick  [9]  and  Das  and  Resnick  [4,  5]  in  terms  of  regular  variation  of  the  joint 
distributions,  but  taking  place  on  a  smaller  cone  than  the  one  employed  in  multivariate  extreme 
value  theory.  This  is  related  to  the  concept  of  hidden  regular  variation;  see  Resnick  [14]. 

We  return  to  the  formulation  of  Heffernan  and  Tawn  [10]  in  terms  of  conditional  distributions, 
placing  it  in  a  more  formal  context  by  drawing  upon  the  theory  for  transition  kernels  in  a  domain 
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of  attraction  developed  in  [15].  In  particular,  we  assume  that  the  dependence  between  a  pair  of 
random  variables  (X,Y)  is  specified  by  a  transition  kernel  K ;  this  is  appropriate,  for  example,  in 
cases  where  one  variable  can  be  modeled  as  an  explicit  function  of  the  other.  In  order  to  better  fit  in 
with  the  study  of  extremes  of  a  random  vector,  we  extend  the  kernel  domain  of  attraction  condition 
used  in  [15]  beyond  standardized  regular  variation  to  accomodate  general  linear  normalization  in 
both  the  initial  state  and  the  distribution  of  the  next  state.  We  examine  conditions  under  which 
this  extends  to  a  CEVM,  when  combined  with  a  marginal  domain  of  attraction  assumption,  and 
we  derive  explicit  formulas  for  the  CEV  limit  measure  in  different  cases.  Also,  through  a  number 
of  revealing  examples,  we  explore  the  properties  of  the  normalization  functions,  and  technicalities 
surrounding  the  choice  of  version  of  the  conditional  distribution  and  the  limit  distribution  G. 


2.  Background 

We  begin  by  presenting  some  necessary  background  material.  First,  we  review  the  basics  of 
extended  regular  variation,  which  features  prominently  in  the  formulation  of  the  CEVM,  as  well 
as  some  concepts  of  univariate  extreme  value  theory.  We  then  introduce  the  Conditional  Extreme 
Value  Model  formally  and  discuss  its  basic  properties. 


2.1.  Extended  Regular  Variation.  Regular  variation  and  extended  regular  variation  is  impor¬ 
tant  in  the  mathematical  description  of  extreme  and  conditional  extreme  value  theory.  Standard 
references  include  [2,  6,  12,  13,  17].  The  pair  of  functions  a  :  (0,  oo)  i-a  (0,oo)  and  /  :  (0,oo)  i-a  M 
are  extended  regularly  varying  (ERV)  with  parameters  p,  k  G  M  if  as  t  -A  oo, 


(2.1) 


[6,  Appendix  B.2],  where 


and 


/O)  -  f(t) 

a(t) 


ip{x), 


(2.2) 


ip(x) 


kp  1(xp  —  1)  p  /  0 
k  log  x  p  =  0 


x  >  0, 


We  will  write  this  as  a,  f  G  ERVPifc.  Thus,  a  G  RVP,  the  regularly  varying  functions  of  index  p.  A 
useful  identity  is 

(2.3)  '(/’(aT1)  =  —x~p'i/j(x). 


Note  that  this  differs  slightly  from  the  usual  definition  of  extended  regular  variation,  which  assumes 
k  =  1.  If  cj>(x)  :=  (tx)  —  f(t))/a(t )  exists  for  x  >  0,  then  a  is  necessarily  regularly  varying, 

and  <j>  =  ip,  the  function  given  in  (2.2).  Also,  the  convergences  in  (2.1)  are  locally  uniform,  implying 
that 


a(txt) 
a(t ) 


and 


-  f(t ) 
a(t ) 


^(x) 


whenever  xt  — >  x  >  0. 


Furthermore,  if  k  =/=■  0  we  obtain  the  following  properties  depending  on  the  value  of  p.  Recall  the 
sign  function  sgn(u)  =  u/\u\  l{u^o}- 

•  If  p  >  0,  then  /  •  sgn [k)  €  RVP,  and  f(t)/a(t )  -A  k/p. 

•  If  p  <  0,  then  /(oo)  =  lirn^oo  f(t)  exists  finite,  (/(oo)  —  /)  •  sgn (k)  G  RV_|pi,  and  (/(oo)  — 
f{t))/a{t)  -A  k/\p\. 

•  If  p  =  0,  i.e.,  a  is  slowly  varying,  then  /  G  11(a)  (see  [6,  Appendix  B.2]).  Suppose  k  >  0. 
Then  /(oo)  <  oo  exists.  If  /(oo)  =  oo,  then  /  G  RVo  and  f(t)/a(t )  — >  oo.  If  /(oo)  <  oo, 
then  /(oo)  — /  G  RVq,  and  (/(oo)  —  f{t))/a(t)  -A  oo.  If  k  <  0,  then  — /  has  these  properties. 
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2.2.  Domains  of  Attraction.  For  7  G  M,  define  E7  =  {x  G  M  :  1  +  72;  >  0}.  Observe  that 

(— 7_1, 00)  7  >  0 

(—00, 00)  7  =  0. 

(-oo,  M-1)  7<0 


(2.4) 


E7  =  < 


The  distribution  F  of  a  random  variable  Y  is  in  the  domain  of  attraction  of  an  extreme  value 
distribution  G7  for  some  7  G  M,  written  F  G  D{G1),  if  there  exist  functions  aft)  >  0  and  b(t )  G  M 
such  that 

Ft(a(t)y  +  b(t))  — >  C7([-oo,  y]) 

weakly  as  t  00,  where  G7([— 00,  y])  =  exp{  —  (1  +  72/)-1^7}  for  y  G  E7  [6,  12].  This  can  be 
reformulated  in  terms  of  the  tail  of  the  distribution  F  as 


(2.5) 


t  P[Y  -  b(t)/a(t)  >  y\  — >(l+jy)  1/7 ,  y  G  E7. 


If  7  =  0,  we  interpret  the  limit  as  e  y . 

If  (2.5)  holds  for  some  functions  a  and  b,  then  it  holds  for  [6,  Theorem  1.1.6,  p.  10]) 

(2.6)  6(t)=(r=: Fv)  = 

where  g^~  denotes  the  left-continuous  inverse  of  a  nondecreasing  function  g.  Hence,  by  inversion, 
(2.5)  implies  that 


(2.7) 


bftx)  —  b(t) 
aft ) 


x 7  —  1 
7 


1(77^0}  T  fo§  %  "^"{7—0} 1 


i.e.,  a,  b  G  ERV7i.  Furthermore,  if  functions  a  >  0  and  b  G  M  on  (0, 00)  are  asymptotically  equivalent 
to  a,b ,  i.e.,  they  satisfy 


a(t) 
aft ) 


and 


hft)  ~  bft ) 

a(t ) 


as  f  — )•  00, 


then  (2.5)  and  (2.7)  hold  with  a,  b  replaced  by  a,  b.  It  follows  that  (2.5)  is  equivalent  to  t  P[5<_(T)  > 
ty]  — >  y^1  for  y  >  0,  i.e.,  1  —  F^i-ry)  £  RV_i.  This  is  known  as  standardization  (see  [12,  Chapter 
5]).  We  say  that  Y  is  in  the  standardized  domain  of  attraction ,  and  write  F  G  D{G\),  if 

tP[Y>ty\  — >  y^1,  y>  0. 


2.3.  The  Conditional  Extreme  Value  (CEV)  Model.  Denote  by  E7  the  closure  on  the  right  of 
the  interval  E7.  A  bivariate  random  vector  (A,  Y)  on  M2  follows  a  Conditional  Extreme  Value  Model 
(CEVM)  if  there  exist  a  non-null  Radon  measure  //  on  [—00,  00]  x  E7,  and  functions  aft),  aft)  >  0, 
bft),  (5  ft)  G  M,  such  that,  as  t  — >•  00, 


(2.8) 


t  P 


(X-m  Y  —  bft)  \ 
V  <x(t)  ’  aft)  ) 


in  M+([— 00,  00]  x  E7), 


where  y  satisfies  the  conditional  non- degeneracy  conditions:  for  each  y  G  E7, 


y([— 00,  x]  x  (y,  00])  is  not  a  degenerate  distribution  in  x; 
M({oo}  x  fy,  00])  =  0. 


It  is  convenient  to  choose  the  normalization  such  that 


(2.10)  H(x)  :=  y([—oo,x]  X  (0,oo])  is  a  probability  distribution  on  [—00,00]. 

See  [4,  9]  for  details  and  [10]  for  background. 

Some  remarks:  By  applying  the  joint  convergence  (2.8)  to  rectangles  [—00,00]  X  (y,  00],  we  see 
that  the  distribution  of  Y  is  necessarily  attracted  to  G7  for  some  7.  Also,  an  important  property  is 
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that  the  functions  a,  (3  are  ERV  for  some  p,  k  6  R  [9,  Proposition  1],  The  limit  measure  p  in  (2.8) 
is  a  product  measure  if  and  only  if  (p,  k)  =  (0,  0)  [9,  Proposition  2]. 

Condition  (2.9)  is  somewhat  different  from  what  is  given  in  [4,  9]  which  contained  a  redundancy 
and  allowed  mass  on  the  line  {oo}  x  (— oo,  oo]  through  infinity.  Mass  on  this  line  invalidates 
the  convergence  to  types  theorem  [7,  11].  The  theory  in  [4,  9]  employs  convergence  of  types 
arguments  which  require  no  mass  on  the  lines  through  {oo}.  Condition  (2.8)  entails  Y  €  D{Gfi) 
and  p([— oo,x]  x  {oo})  =  0.  We  require  also  that  y({oo}  x  (y,  oo])  =  0,  a  fact  not  implied  by 
(2.8).  Example  3.6  presents  a  case  where  (2.8)  holds  for  two  distinct  normalizations  which  are 
not  asymptotically  equivalent,  yielding  a  different  limit  measures.  One  of  the  limit  measures  has 
y({oo}  X  (y,  oo])  >  0  and  the  other  has  /x({oo}  X  (y,  oo])  =  0. 

3.  Standard  Case 

Let  (X,Y)  be  a  random  vector  on  R2,  with  dependence  specified  by  a  transition  kernel  K: 

P[Xe-\Y  =  y]  =  K(y,-)  ye  R. 

We  show  if  the  distribution  of  Y  is  in  an  extremal  domain  of  attraction,  and  K  belongs  to  the 
domain  of  attraction  of  a  probability  distribution  G  (a  notion  to  be  defined  precisely),  then  (X,Y) 
follows  a  CEVM. 

We  begin  with  the  standard  case  which  means  that  {X,Y)  e  [0,oo)2,  and  Fy  e  D(G\ ), 

(3.1)  tFy(t  •)  — ^Aiq(-)  in  M+(0,  oo]  as  t  — >  oo, 

and  K  e  D(G)  meaning 

(3.2)  K(t ,  t-)  =>  G(-)  on  [0,  oo]. 

Here  =4>  denotes  weak  convergence,  and  we  write  £  ~  G  to  mean  G{- )  =  P[£  €  •]. 

3.1.  Standard  CEVM  Properties.  Conditions (3.1)  and  (3.2)  imply  (X,  Y)  follows  a  CEVM, 
provided  G  /  eo,  he.,  unit  mass  at  {0}. 

Theorem  3.1.  Suppose  that  the  joint  distribution  of  the  random  vector  (X,Y)  on  [0,oo)2  satisfies 
(3.1)  and  (3.2),  where  G  is  a  probability  distribution  on  [0, oo).  Then 

(3.3)  t  P  [(V,  Y)  e  t  ■  ]  — ^  p(-)  in  M+([0,  oo]  x  (0,  oo]), 

with  limit  measure  p  given  by 

r  poo 

(3.4)  /i([0,  x\  x  (y,  oo])  =  /  v\{du)  P[£  <  xu~1}  =  /  G(x/u)v\(du),  x,y  >  0. 

J{y,oo]  jy 

Furthermore,  p  satisfies  the  conditional  non- degeneracy  conditions  (2.9)  provided  G  6q. 

Proof.  The  convergence  (3.3)  is  an  application  of  Proposition  ??  (a)  (p.  ??),  with  a  =  1  and  m  =  1, 
and  with  Y  playing  the  role  of  Vo-  From  (3.5)  below,  we  see  that  p([0,x]  x  (y,  oo])  is  continuous 
in  x,  and  not  constant  provided  G  eo-  Also,  since  p((x,oo]  x  (y,  oo])  =  f(yoo ]  v\{du)  P[^  >  ra”1], 
that  ^({oo}  x  (y,  oo])  =  0  follows  from  the  fact  that  G({oo})  =  0.  Therefore,  p  satisfies  (2.9).  □ 

3.1.1.  Properties  of  the  limit  measure  p.  By  changing  variables,  p  can  be  expressed  as 

l  rx/y 

(3.5)  y([0,x]  x  (y,  oo])  =  —  P[£  <  u\du 

x  Jo 

=  y~V  P[£  <  x/y\  -  x_1  E^l {£<x/y}, 

showing  that  p  is  continuous  in  x  and  y  and  if  G  has  a  density,  then  so  does  p.  Note  that  the 
continuity  in  (3.5)  holds  even  if  G  is  degenerate,  i.e.,  G  =  ec  for  some  c  >  0;  see  Example  3.4  (p.  6). 
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Non-degeneracy  of  G  only  becomes  relevant  in  the  non-standard  case.  Moreover,  /j  cannot  be  a 
product  measure  [4,  Lemma  3.1]. 

From  (3.5)  we  also  observe  that  the  y-axis  through  the  origin  is  assigned  mass  proportional  to 
^({O})  since  y({0}  x  (y,  oo])  =  y-1G({0}).  Mass  on  vertical  slices  of  space  depends  on  E£,  since 
y((x,  oo]  x  (0,  oo])  =  x_1  E£  <  oo.  In  terms  of  conditional  distributions,  (3.3)  implies 

,  i  r 

P[X  <  tx  |  Y  >  t]  =$■  H(x)  :=  y([0,  x\  x  (1,  oo])  =  —  P[£  <  u\du. 

x  J  o 

3.1.2.  Extending  to  a  larger  cone.  Convergence  (3.3)  extends  to  standard  regular  variation  on  the 
larger  cone  [0,oo]2\{0},  so  that  the  distribution  of  (X,  Y)  is  in  a  bivariate  domain  of  attraction,  if 
and  only  if  F\  €  D(G\)  as  well  [4,  Proposition  4.1].  In  this  case, 

(3.6)  t  P  [t_1(X,  Y)  g  [0,  (x,  y)]c]  — *^(l  +  P[£<u]du^, 

implying  that  E£  <  1,  and  the  x-axis  receives  mass  according  to  /x((x,oc]  x  {0})  =  x-1(l  —  E£). 

3.1.3.  Degenerate  G;  asymptotic  independence.  If  G  =  eo,  then  the  convergence  (3.3)  holds  with 
limit  measure  y([0,x]  x  (y,  oo])  =  y~1  but  conditional  non-degeneracy  (2.9)  fails,  since  all  the  mass 
lies  on  the  y-axis,  so  (X,  Y )  does  not  follow  a  standard  CEVM.  This  is  in  fact  a  manifestation  of 
asymptotic  independence.  Indeed, 

P  [X  >  tx  |  Y  >  t]  — >  0 

for  any  x,  so,  given  that  Y  is  extreme  (exceeding  the  threshold  u(t )  =  t),  it  is  very  unlikely  to 
observe  X  to  be  similarly  extreme.  If  the  joint  distribution  of  ( X ,  Y)  is  regularly  varying  on  the 
larger  cone  [0,oo]2\{0},  then 

t  P  [t_1(X,  Y)  €  [0,  (x,  y)]c]  — ■>  x_1  +  y-1, 

which  means  that  X  and  Y  are  asymptotically  independent  in  the  usual  sense  [9,  Section  5].  In 
this  case,  (X,  Y)  do  not  follow  a  standard  CEVM  because  of  degeneracy,  although  a  CEVM  may 
hold  if  X  is  normalized  differently,  as  in  Section  4. 

This  suggests  viewing  the  parameter  G({0})  as  quantifying  the  “degree”  of  asymptotic  depen¬ 
dence  from  Y  to  X.  For  example,  given  Y,  we  could  write  X  as  a  mixture 

(3.7)  X  =  ITXo  +  (1  -  W)XU 

where  Xo  and  Y  are  asymptotically  independent,  X\  and  Y  are  asymptotically  dependent,  and 
W  ~  Bernoulli ( G ({()})) .  This  relates  the  canonical  form  of  the  update  function  representation  of 
K  (see  Section  ??,  p.  ??).  Asymptotic  dependence  in  the  reverse  direction,  given  large  X,  would 
then  be  quantified  by  1  —  E  £  if  appropriate.  The  latter  phenomenon  is  hinted  at  by  Segers  [16]  in 
his  definition  of  the  “back-and-forth  tail  chain”  to  approximate  stationary  Markov  chains  . 

3.2.  Examples.  Examples  illuminate  properties  of  the  CEVM  based  on  Markov  kernels  as  in  (3.2). 
First,  given  any  distribution  G  on  [0,  oo),  we  can  construct  a  CEVM  whose  limit  measure  /j  is  built 
on  G  as  in  (3.5).  See  [4,  Example  8]. 

Example  3.1.  Take  G  to  be  any  probability  distribution  on  [0,  oo)  not  concentrating  at  0.  Let 
Y  ~  Pareto(l)  on  [1,  oo)  ,  £  ~  G ,  independent  of  Y,  and  put  X  =  £Y.  A  version  of  the  conditional 
distribution  is 

K(y ,  ■)  =  P[X  G  ■  |  Y  =  y\  =  P[£V  £  •  |  Y  =  y\  =  ^(y-1-), 
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and  K  satisfies  (3.2)  and  in  fact  and  K(t,  t-)  =  G(-).  Consequently,  (X,Y)  follows  a  standard 
CEVM  with  limit  measure  as  in  (3.4).  In  fact,  for  x,  y  >  0,  we  have 

P[X  <  x,  Y  >y\=  f  K(u  ,  [0,  x])P[Y  G  du] 

J (y, oo] 


|  rxA- 

P[£  <  xu~l]u~2du  =  -  /  y  P[£  <  u\du. 
X  Jo 


J y\/l  ^  JO 

Furthermore,  (X,Y)  belong  to  a  standard  bivariate  domain  of  attraction  (3.6)  iff  Fx  G  D(G\)  as 
well.  The  marginal  distribution  of  X  =  £Y  is 

T\-([0,x])  =  —  /  P[£  <  u]du  =  —  G(u)du  =  H(x), 

X  Jo  X  J0 

which  has  density  fx{x)  =  x_1{G([0,  x])  —  ff([0,x])}  for  x  >  0.  Since 

lim  t  PLY  >  tx  1  =  lim  —  f  P[£  >  u\du  =  x~l  (<  oo), 
t- ¥oo  t-> oo  x  Jo 

(X,  Y)  belongs  to  the  standard  domain  of  attraction  iff  E£  =  1. 

Using  the  Example  3.1  recipe,  we  explore  the  CEVM  in  a  variety  of  special  cases. 


□ 


Example  3.2.  Choose  £  ~  Exp(A)  and  we  have  X  =  A  XYE,  where  E  ~  Exp(l).  The  limit 
measure  is 

1  fx/y  r  11  e~Xx/y 


1  [x/y  .  1  1 
M([0,x]  X  (y,  oo])  =  -  (1  -  e  Xu)du  =  -  -  —  + 

x  Jo  y  ax 


Xx 


and  the  marginal  distribution  of  X  is  Fx(x)  =  1  —  (Ax)  1(1  —  e  Az)  with  density  f(x)  =  A  xx  2(1  — 
e~Xx)  —  x~1e~Xx,  and  F\  satisfies  (3.1)  iff  A  =  1.  □ 

Next,  we  suppose  £  is  heavy-t ailed. 

Example  3.3.  For  a  >  0  let  £  ~  Pareto(a)  so  G(x)  =  x_a,  x  >  1.  The  limit  measure  assigns  no 
mass  to  {(x,  y)  :  0  <  x  <  y},  and  for  x  >  y  >  0, 


M([0,x]  x  (y,  oo])  =  < 


a 


1 

-  + 


y 


.01—1 


i 

y  \a  —  1 J  x  '  xa(a  —  1) 
1  1  log  x  log  y 

V  x 


x 


1 


+ 


2  —  a  \  1 

-  + 


1 


1  —  a  J  x  xayl~a{  1  —  a) 


a  >  1 

a  =  1 

a  <  1. 


□ 


When  a  <  1,  E£  =  oo  and  //((x,  oo]  x  (y,  oo])  =  y_1  —  /i([0,  x]  x  (y,  oo])  — >•  oo  as  y  |  0. 

It  is  also  possible  that  G  is  discrete,  although  the  CEVM  limit  measure  //  remains  continuous. 

Example  3.4.  Suppose  £  has  discrete  distribution  P[£  =  k ]  =  a*,,  k  =  0, 1, . . . .  In  this  case,  the 
limit  measure  is  given  by 

1  rx/y  ,  M  .  [*/»] 

A*([0,  x]  x  (y,  oo])  =  -  J  (  X/  akj du  =  ^  afc(y_1  -  kx -1), 


k=0 

Isl 


k= 0 
-U 


which  is  continuous  in  x  and  y,  and  Fx(x)  =  0^(1  —  kx  1).  In  particular,  if  P[£  =  c]  =  1  for 

some  c  >  0,  we  obtain 

y([0,  x]  x  (y,  oo])  =  (y^1  -  cx-1)  l{x>cy>o}  ■ 

The  conditional  non-degeneracy  conditions  (2.9)  are  satisfied  even  though  G  is  degenerate.  □ 
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The  final  example  shows  how  G  reflects  asymptotic  independence  between  X  and  Y . 

Example  3.5.  Consider  Y  ~  Pareto(l),  and  Z  independent  of  Y  such  that  P [Z  <  oo]  =  1.  Take 

X  =  Y  V  Z  =  Y  1  {Y>z}  YZ  l{z>y}  • 

Given  that  Y  is  extreme,  it  is  very  unlikely  that  Z  is  as  extreme  as  Y ,  since  they  are  independent. 
We  have 

K(y,  [0,  x\)  =  P  [Y  <  x,  Z  <  x\Y  =  y]  =  P[Z  <  x]  l{x>y}, 

and  so 

K(t ,  t[0,x])  =  P [Z  <  tx]  — »  el([0,  x])  =  1{X>!}  =  G([0,x]). 

As  in  the  previous  example,  the  limit  measure  is 

M[°>  x\  x  (y,  °°D  =  ( y~l  -  ^_1)  1{x>y>o}  ■ 

On  the  other  hand,  consider  X'  =  Y  A  Z  =  Y  1{y<z}  YZ  1[z<y}-  When  Y  is  large,  it  is  likely 
X'  =  Z.  so  X'  is  asymptotically  independent  of  Y .  Indeed,  in  this  case, 

K(y,  (®,oo])  =  P  [Y  >  x,Z  >  x\Y  =  y]  =  P[Z  >  x]  l{y>x}, 

from  which 

K(t ,  t(x,  oo])  =  P [Z  >  tx]  l{x<i}  — >  0 

for  x  >  0.  Therefore,  G  =  e o,  and  the  conditional  non-degeneracy  conditions  do  not  hold.  □ 

3.3.  Counter-examples.  As  expected,  the  converse  to  Theorem  3.1  can  fail.  If  (X,Y)  follows 
a  non-degenerate  CEVM  as  in  (3.3),  and  K  is  a  specific  version  of  the  conditional  distribution 
P[X  E  •  |  Y  =  y],  it  does  not  necessarily  follow  that  there  exists  a  distribution  G  such  that  (3.2) 
holds.  The  failure  of  (3.2)  can  happen  in  two  ways.  There  may  exist  a  probability  distribution 
G  on  [0,  oo]  satisfying  (3.2)  with  G({oo})  >  0  or  it  may  be  possible  to  obtain  two  distinct  limit 
distributions  down  different  subsequences  {tn}  and  {t'n}. 

Example  3.6  where  G({oo})  >  0  emphasizes  the  importance  of  assuming  ^({oo}  x  (y,  oo])  =  0. 

Example  3.6.  As  usual,  take  Y  ~  Pareto(l)  and  suppose  that 

X  =  WY +  (1-W)Y2, 

where  W  ~  Bernoulli  (p)  independent  of  Y .  Then 

K(y,  •)  =  P[X  E  •  |  Y  =  y\=pey  +  (l-p)ey2, 


so 

K(t ,  t-)  =pei  +  (1  -p)et  ^  pe±  +  (1  -  p)e00  =  G  on  [0,oo]. 
Indeed,  for  0  <  x  <  oo, 

K(t,  t[0,x])  =pei([0,x])  +  (1  -p)et([0,x])  — >pei([0,x]) 

showing  that  G({oo})  =  1  —  p. 

On  the  other  hand,  for  x,  y  >  0, 

P [X  <  x,Y  >  y\  =  p  P[Y  <  x,  Y  >  y]  +  (1  —  p)  P[T2  <  x,  Y  >  y] 


=  P 

so  for  t  sufficiently  large, 


1 


1 


(y  v  l)  x 

tP[X  <  tx ,  Y  >  ty\  =  p 


l{rr>(j/Vl)}  +(1  —  p) 


1 


1 


(yVl)  y/x 


L{^>Ovi)2}) 


(3.8) 


P 


1  1 

y  x 

l  l 

y  x 


^-{x>y}  +(1  P ) 


1  y/t 

Ly  Vx\ 


l{Yx/Yt>y} 


1{x>y}  =  p([0,x\  X  (y,  oo]). 
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The  measure  p  assigns  positive  mass  to  {00}  x  (y,  00]  since 

^  _  P  P 

MOu  00]  x  (y,  00])  =  y_i  -  y([0,  x]  x  (y,  00])  =  -  l{x<y}  H - -  +  ~  1 

y  [  y  xj 

and  thus  y({oo}  x  (y, 00])  =  (1  —  y)y_1.  Therefore,  p  does  not  satisfy  (2.9). 

Under  a  different  normalization,  we  obtain  a  proper  limit  G.  Indeed,  note  that 

K(t ,  t2-)  =  pet-i  +  (1  —  p)e  1  =>•  ye 0  +  (1  —  y)ei  ~  Bernoulli(l  —  y), 

and  hence, 

t  P[X  <  t2x,  Y  >  t  ■  y]  =y(y_1  -  (hr)-1)  l{x>j,/t}  +(!  -  p)(y_1  -  x~l/2)  ±{x>y} 

— >  py~l  +  (1  -  y)(y“1  -  x_1/2)  1  {*>„}  • 

This  limit  does  satisfy  (2.9).  □ 

Without  condition  (2.9),  the  convergence  of  types  theorem  fails  and  it  is  possible  to  obtain 
different  CEV  limits  under  different  normalizations.  From  (3.4),  y({oo}  X  (y,ooj)  =  G({oo})y-1 
and  excluding  defective  distributions  in  Theorem  3.1  avoids  cases  like  the  previous  one. 

Here  is  an  example  of  a  CEVM  where  the  normalized  kernel  K  does  not  have  a  unique  limit. 

Example  3.7.  Suppose  Y  ~  Pareto(l),  and  define  X  by 

X  =  WY  +  (1  -  W)2Y  l{ye[o!0o)\N} 

where  W  ~  Bernoulli(y)  independent  of  Y .  In  other  words,  given  Y  =  y,  X  takes  the  value  y  or  2 y 
according  to  a  coin  flip,  unless  y  is  an  integer,  in  which  case  X  will  be  either  y  or  0.  The  CEVM 
holds  for  (V,  Y).  Since  P[7  £  N]  =  0,  we  have 

P[X  <x,Y  >y]  =  P[X  <x,Y  >y,Y  e  [0,  oo)\N] 

=  y  P(y  <  x,  Y  >  y)  +  (1  —  y)  P(2V  <  x,  Y  >  y) 

=  p(y_1  -  a:-1)  l{a ->y}  +(1  -  p)(y~1  -  2x_1)  l{x>2y}, 

and  tP\X  <  tx,Y  >  ty \  =  P[V  <  x,Y  >  y],  which  satisfies  (2.10)  and  the  requirement  that 
y((-)  x  (y,  00])  not  be  degenerate  for  any  y.  However,  the  conditional  distribution  of  X  given  Y  is 


so 


K(y, 


pey  +  (l-p)e0  ye  N 

Pey  +  (1  —  p)e2y  yG[0,oo)\N’ 


yei  +  (l— y)eo  f  G  N 

ye  1  +  (1  -  y)e 2  t  G  [0,  oo)\N 


We  obtain  different  limits  along  the  sequences  tn  =  n  and  t'n  =  n/2  and  I\(t,  t  •)  does  not  converge. 

□ 


The  technical  difficulty  highlighted  in  Example  3.7  is  that  conditional  distributions  of  the  form 
P[I  6  •  |  Y  =  y\  are  only  specified  up  to  sets  of  P[f  6  •  (-measure  zero.  If  Y  is  absolutely  continuous, 
we  can  alter  the  conditional  probability  for  a  countable  number  of  y  without  affecting  the  joint 
distribution.  Consequently,  it  is  difficult  to  construct  a  convergence  theory  based  on  conditional 
distributions.  The  best  one  can  do  is  fix  a  version  of  the  kernel  or,  if  circumstances  allow,  choose  a 
version  of  the  kernel  with  some  claim  to  naturalness  based  on  smoothness.  This  is  the  reason  the 
approach  in  [4,  9]  is  based  on  vague  convergence  of  measures  rather  than  convergence  of  conditional 
distributions  as  in  [10]. 
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4.  General  Normalization  for  X 


The  CEVM  allows  different  normalizations  for  X  and  Y,  as  in  (2.8),  but  the  formulation  K  E 
D{G)  in  (3.2),  imposes  the  same  normalization  for  both.  We  now  allow  general  linear  normalizations 
of  X  in  the  kernel  condition,  continuing  to  assume  Fy  E  D(G\ )  (3.1). 

We  will  assume  the  following  generalization  of  (3.2):  there  exists  a  non-degenerate  probability 
distribution  G  on  [—00,00),  scaling  and  centering  functions  aft)  >  0,  /3(t)  E  R,  such  that 

(4.1)  K(t ,  [—oo,a(t)x  +  P(t)])  =>  G([  —00,2;])  on  [—00,00]. 

4.1.  CEVM  Properties.  Consider  the  decomposition 

(4.2)  tP  — — <  x,  Y  >  ty  =  [  t  P[V  E  tdu\K(tu ,  [— 00,  a(t)x  +  Pit)]). 

WO  .  J(y,oo] 

By  a  variant  of  the  continuous  mapping  theorem  ([15,  Lemma  8.2],  this  will  converge  provided 
K(tu(t) ,  [—00 ,a(t)x  +  /3(i)])  — >  (Px(u)  whenever  u(t)  — >  u  >  0.  What  are  conditions  such  that 
kernel  convergence  (4.1)  implies  joint  distribution  convergence  in  (4.2)? 

Given  p,  k  E  R,  define  the  generalized  tail  kernel  associated  with  a  distribution  G  on  [—00, 00] 
as  the  transition  function  kq  ■  (0, 00)  x  B[— 00,00]  — >  [0, 1]  given  by 

(4.3)  KG(y,  A)  =  G{y~p[A  -  ip(y)\), 

where  ip  is  specified  in  (2.2)  (p.  2).  Note  that  kg  describes  transitions  between  two  different  spaces. 
Since  ip  satisfies  ip(uy )  =  upip(y )  +  ip(u),  a  kernel  k  has  the  form  (4.3)  iff 

n{uy  ,  A)  =  n(y,  u~p[A  -  ip(u)]) . 

Proposition  4.1.  Let  K  :  (0,oo)  x  B[— 00,00]  — »•  [0,1]  be  a  transition  function  satisfying  (4.1), 
where  G  is  non- degenerate.  There  exists  a  family  of  non- degenerate  distributions  {Gu  :  0  <  u  <  00} 
on  [—00, 00]  such  that 

(4.4)  K(tu ,  [— 00,  a(t)x  +  P(t)])  =>  Gu{[— 00,  x\)  on  [—00,00],  0  <  u  <  00, 
as  t  -A  00  if  and  only  if  a,  (3  E  ERVPifc  as  in  (2.1)  (p.  2).  In  this  case,  G\  =  G,  and 

(4.5)  K (tut ,  [— 00,  a(t)x  +  /3(t)])  =4>  kg(u  ,  [—00,2:])  on  [—00,00] 

whenever  ut  =  u(t)  — >  u  E  (0,  00),  i.e.,  the  limit  is  a  transition  function  of  the  form  (4.3),  where 
p,  k  are  the  ERV  parameters  of  a,  (3. 


Proof.  Assume  first  that  a,  /3  E  ERVp^  and  define 

a(tu)  /3(tu )  -  (3(t) 

ht(y;u)  =  -^-%+  m  , 

a[t)  apt) 

so  that  by  (2.1),  ht(yt',  u)  — >  h(y ;  u )  =  upy  +  ip(u)  whenever  y±  — >  y  E  R.  For  u  >  0, 

K ( tu  ,  [—00,  a(t)x  +  /3(f)])  =  K ( tu  ,  a(tu){h[T1(- ;  u)[— 00,  x]}  +  /3(hu)) , 
and  applying  the  second  continuous  mapping  theorem  [1]  to  the  weak  convergence  (4.1),  we  have 
K{tu ,  [— 00,  a(t)x  +  (3(t)])  =4-  (G  o  /i_1(- ;«))([— 00, 2;]). 


Hence,  (4.4)  holds  with  Gu  =  kq(u ,  •),  and  so  G\  =  G.  Furthermore,  we  have  ht(xt]  ut)  ~ >  h(x;u ) 
whenever  ut  — >  u  >  0,  establishing  (4.5). 

For  the  converse,  we  employ  convergence  of  types.  Denote  by  Ht  the  distribution  K(t,-).  Then, 
on  the  one  hand,  we  have  Ht{[— 00,  a(t)x  +  /?(£)])  =>  Gi([— 00,  x]).  On  the  other  hand,  fixing  c  >  0, 
we  have 


Ht{a{tc)x  +  (3(tc))  =  K((tc)c  1 ,  [—00,  a(tc)x  +  /3(tc)j)  Gc-i  ([— 00,  x]). 
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Convergence  of  types  yields  that  a,/3  6  ERVPifc,  and 

Gc-i([-oo,x])  =  Gi([-oo,  cpx  +  ip(c)]), 

with  if  as  in  (2.2).  Using  the  identity  (2.3)  (p.  2),  we  find  that  Gu  has  the  form  (4.3),  with 
G  =  Gi.  □ 

A  consequence  of  Proposition  4.1  is  that,  in  order  to  obtain  a  CEVM  using  (4.1),  a  necessary 
and  sufficient  condition  is  that  a,  (3  are  ERV.  Requiring  G  to  be  non-degenerate  is  necessary  in 
order  to  apply  convergence  of  types.  For  now,  we  continue  to  assume  that  Y  is  in  the  standardized 
domain  of  attraction. 


Theorem  4.1.  Suppose  (X,Y)  is  a  random  vector  on  M  x  [0,oo)  and  (3.1)  holds  so  that  Fy  G 
D(G\).  Assume  K(y ,  •)  =  P[X  G  •  |  Y  =  y\  converges  according  to  (4.1)  for  scaling  and  centering 
functions  a(t )  >  0  and  (3(t)  G  M  and  non- degenerate  limit  distribution  G  on  [—00,00).  As  t  — >  00, 


(4.6) 


t  P 


(x-m  y\ 

V  a(t)  ’  t ) 


MO  ^ 0 


in  M+([— 00, 00]  x  (0,  00]) 


where  p  satisfies  (2.9),  if  and  only  if  a,  (3  G  ERVP)fc.  In  this  case,  n  is  specified  by 


(4.7)  p  ([— 00,  x\  x  (y,  00])  =  /  ui(du)  P  [£  <  u  p(x  —  ,  x  G  M,  y  >  0, 

J(y,oo] 

with  if  as  in  (2.2)  (p.  2)  and  £  ~  G.  The  expression  (4.7)  is  continuous  in  x  and  y  if  (p,  k)  /  (0, 0), 
or  if  G  is  continuous. 


Proof.  The  convergence  (4.6)  to  a  limit  p  satisfying  (2.9)  implies  a,  (3  G  ERV  [9,  Proposition 
1].  Conversely,  if  a,  (3  G  ERVPifc,  then  the  convergence  (4.6)  follows  from  Lemma  ??  (p.  ??)  in 
light  of  (4.5),  yielding  the  limit  in  (4.7).  We  check  that  p([— 00,  x]  x  (y,  00])  is  continuous  when 
( p ,  k)  7^  (0,  0).  Indeed,  applying  dominated  convergence,  if  xn  — >  x,  then 

P[£  <  u~p(xn  —  ip(u))]  -A  P[£  <  u~p(x  —  i>(u))] 

for  all  except  a  countable  number  of  u  corresponding  to  discontinuities  of  the  distribution  function. 
Continuity  in  y  is  clear.  Also,  if  (p,  k)  =  (0,0),  then  p([—  00,  x]  X  (y,  00])  =  y~1G([— 00,  x]),  which 
is  continuous  if  G  is.  In  either  case,  y([— 00,2;]  x  (y,  00])  is  non-degenerate  in  x  because  G  is 
non-degenerate.  Finally,  y({oo}  x  (y,  00])  =  y^1G({oo})  =  0.  Therefore,  p  satisfies  (2.9).  □ 


The  limit  measure  is 
(4.8) 
where 


ry~x 

y([— 00,  x]  x  (y,  00])  =  /  duP  [f  <  upx  +  . 

Jo 


upx  +  if(u)  = 


■up(x  +  kp  x)  —  kp  1  p  7^0 
x  +  k  log  u  p  =  0 

Changing  variables,  we  obtain  the  following  expressions  for  p  according  to  ( p,k ): 
(4.9)  p([-oo,x\  x  (y,oo])  = 


1 


"| x+kp  1\y  p 


p\x  +  kp~l\l/P  J0 

l  rxsgn(k)-\k\  logy 

\k\ex/k  J-oo 
l y_1  P[C  <  x\ 


UA  p)/p  P  [^  <  usgn(x  +  kp  x) 

—  kp  x]  du 

P  +  0 

,u/\k\  p  [£  <  u  sgn(fe)]  du 

p  =  0, 

k  f  0 

P  =  0, 

k  =  0 
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Here  sgri(w)  =  v/\v\  ljwop  and  we  read  the  measure  as  y_1  P[£  <  —kp^1]  when  x  =  —kp~l  for  the 
case  p  /  0.  Continuity  in  x  and  y  when  (p,  k)  /  (0, 0)  is  apparent  from  the  above  expressions. 

We  now  demonstrate  a  case  where  K  satisfies  (4.1),  but  (4.6)  fails  because  at,  (3  are  not  ERV. 

Example  4.1.  Consider  Y  ~  Pareto(l),  and  U  ~  Uniform(0, 1),  independent  of  Y.  Put  X  =  U e1  . 
Then 

K(y ,  [0,  x])  =  P[X  <  x  |  Y  =  y\  =  P [U  <  xe~y }  =  xe~y  A  1. 

In  this  case,  polynomial  scaling  is  not  strong  enough  to  give  an  informative  limit,  since 

K(t,  tp[0,  x])  =  xptpe~t  A  1  -A  0. 

The  appropriate  normalization  would  be  exponential  a(t)  =  e*: 

K(t ,  a(f)[0,  x])  =  A  1  — >  x  A  1  =  G([0,  x}). 

In  fact,  by  the  convergence  to  types  theorem,  this  the  only  normalization  yielding  a  non-degenerate 
limit,  up  to  asymptotic  equivalence.  However,  since  a  is  not  regularly  varying,  Theorem  4.1  shows 
that  ( X ,  Y)  cannot  follow  a  CEVM.  Indeed,  consider 


t  P[X  <  a(t)x,Y  >  ty\  =  /  u\{du)K (tu  ,  [0,e*.x]) 

J(y, oc\ 

=  I  v'i(du){xe~t('u~1'>  A  1}  +  1{j/<1}  [  vi(du){xe~t<'u~1'1  A  1}. 

4((j/vi),oo]  J{yd] 

The  first  integral  in  the  previous  sum  is  bounded  by  xy~1e~t(y~1^  — )•  0.  If  y  <  1,  the  second 
integral  approaches  u\  (y,  1]  =  y"1  —  1.  Therefore,  the  limit  is  degenerate  in  x,  violating  conditional 
non-degeneracy  (2.9). 

In  fact,  we  find  that  no  choice  of  ERV  normalization  will  lead  to  a  CEVM.  Indeed,  suppose  a,  f3 
are  ERV.  Then 

tP[X  <  a(t)x  +  {3(t),Y  >  ty\  =  /  v\  (du)K(tu,  [0,  a(t)x  +  /3(f)]) 

4(y,oo] 

=  f  vi(du) {e~tu (a{t)x  +  /3(f))  A  1}  — >  0 

J(y,oo\ 

(see  Section  2.1  (p.  2)  for  a  summary  of  the  asymptotic  properties  of  ERV  functions). 


4.2.  Standardization  of  X.  Das  and  Resnick  [4,  Section  3.2]  show  that  in  certain  cases,  it  is 
possible  to  standardize  the  X  variable.  Denote  by  x*  and  x*  the  upper  and  lower  endpoints  of  the 
distribution  of  X  respectively,  i.e., 


x*  =  sup{x  :  Fx(x)  <  1}  and  x*  =  inf{x  :  F\(x)  >  0}. 

We  will  call  /  :  (0,oo)  -A  (x*,x*)  a  standardization  function  if  /  is  monotone  and  linx^oo  f{x)  6 
{x*,x*}.  Following  [4,  Section  3],  we  will  restrict  attention  to  standardization  by  such  functions. 
However,  we  have  inverted  the  definition  given  in  [4],  in  the  sense  that  we  will  be  using  to 
standardize  rather  than  /. 

For  the  purpose  of  this  section,  we  extend  the  definition  of  in  order  to  invert  right-continuous 
monotone  functions  which  are  either  increasing  or  decreasing.  Define 


inf'{y  :  f(y)  >  x} 

inf{?/  :  f{y)  <  -x} 


if  /  is  non-decreasing 
if  /  is  non-increasing 


f*~  (x) 
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Note  that  is  left-continuous  for  /  non-decreasing  and  right-continuous  for  /  non-increasing. 
The  main  property  we  shall  be  using  is  that 

{f^~(x)  <  y  x  <  f(y)  f  non-decreasing 

f*~{x)  <  y  x  >  f(y)  f  non-increasing 

The  distinction  between  the  two  cases  is  a  technicality  which  should  not  cause  confusion  in  the 
following  discussion.  Also,  we  will  say  that  a  monotone  function  /  has  two  “points  of  change”  if 
there  exist  x\  <  X2  <  £3  such  that  f(x  1)  <  ffx 2)  <  f(x 3)  for  /  non-decreasing,  and  with  the 
opposite  inequalities  in  the  non-increasing  case. 

If  (X,Y)  satisfy  (4.6)  for  some  a  >  0  and  /3,  then  we  say  (X,  Y)  can  be  standardized  if  there 
exists  a  standardization  function  /  such  that 

(4.10)  tP[t~\r{X),Y)e-]  -%/i*(0  in  M_|_([0, 00]  x  (0,oo]), 

where  y*  is  a  non-null  Radon  measure.  If  the  limit  y  in  (4.6)  satisfies  the  conditional  non-degeneracy 
conditions  (2.9),  then  standardization  is  possible  if  and  only  if  ( p ,  k )  /  (0, 0),  i.e.,  y  is  not  a  product. 
Because  of  the  dependence  on  a  and  /3,  we  can  characterize  functions  /  yielding  (4.10)  in  the 
following  way. 


Proposition  4.2.  Suppose  (X,  Y)  follow  a  CEVM,  i.e.,  (4.6)  holds  with  y  satisfying  the  conditional 
non- degeneracy  conditions  (2.9),  and  ( p,k )  /  (0,0).  Then  a  standardization  function  f  standard¬ 
izes  (X,  Y)  in  the  sense  of  (4.10),  where  y*  satisfies  the  conditional  non- degeneracy  conditions,  if 
and  only  if 


(4.11) 


f(tx)  -  /3{t) 
a{t) 


x  >  0, 


where  cp  has  at  least  two  points  of  change.  In  this  case,  y  and  y*  are  related  by 


where 

(4.12) 


M*([0,arj  x  (y,  00])  =  y{Aip{x)  x  (y,oo]), 


Aip(  x) 


[-oo,<^(x)] 

[<^(x),oo] 


/  non- decreasing 
f  non-increasing 


It  follows  that  a,  f  G  ERV,  although  not  necessarily  with  the  same  parameters  as  a,  f3.  However, 
depending  on  the  case,  /  can  be  expressed  in  terms  of  either  (5  or  a  (see  [3,  Proposition  2.3.3]). 


Proof.  Suppose  /  is  non-decreasing.  Then  for  x,  y  >  0,  we  can  write 


(4.13) 


t  P 


/<-(X)  Y 
- — —  <x,->y 
t  ~  ’  t 


=  t  P 


X  -  fi(t)  f(tx)  -  f3(t )  Y 
aft )  —  aft )  ’  t  ^ 


If  /  satisfies  (4.11),  then  (4.10)  holds  with 


y*([0,  x]  X  (y,  00])  =  y([- 00,  <p(x)\  X  (y,  00]) 

non-degenerate  in  x.  On  the  other  hand,  if  (4.10)  holds,  then  (4.13)  implies  (4.11),  and  <p  has  at 
least  two  points  of  increase  because  y*  is  non-degenerate  in  x.  The  mass  at  {00}  condition  in  (2.9) 
follows  from  the  fact  that  linxr_5.00  <p(x)  =  00  if  /  is  non-decreasing  (see  (4.14)  below).  The  case  for 
/  non-increasing  is  similar,  after  reversing  the  inequality  for  X  on  the  right-hand  side  of  (4.13).  □ 


Assuming  (4.11)  and  a,  f3  G  ERVfc  p,  write 

fftx)  —  j3{t )  aftx)  fftx)  —  j3(tx)  (3(tx)  —  (3(f) 

aft )  aft )  aftx)  aft) 
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and  c  = 
(4.14) 


(/?(!).  We  find  that  ip  has  the  form 


¥>(*) 


cxp  +  kp  1  ( xp  —  1 ) 
c  +  k  log  x 


0 

P  =  0' 


That  ip  is  non-constant  imposes  the  constraint  that  C7^0if/?7^0,£:  =  0. 

What  if  the  conditional  distribution  of  X  given  Y  in  fact  satisfies  the  kernel  convergence  as¬ 
sumption  (4.1)?  We  can  then  apply  any  standardization  directly  the  conditional  distribution  via 
its  transition  function. 


Proposition  4.3.  Suppose  the  transition  function  I\  :  (0,  oo)  x  B[— oo,oo]  -A  [0,1]  satisfies  (4.1) 
for  a  probability  distribution  G  on  [—00,00).  If  f  is  a  standardization  function  satisfying  (4.11), 
then  the  transition  function  Kf  :  (0, 00)  x  £>[0,  00]  -A  [0, 1]  defined  as 

Kf(y,A)  =  K(y,f(A)) 


satisfies 


Kf(t,  t[0,x])  =>  G(Av(x))  =:  G/([0,x])  on  [0,oo], 
with  Acp(x)  as  in  (4.12).  Conversely,  suppose  G([0,oo))  =  l,  and 
(4.15)  Kft ,  t[0,  x])  =>-  G([0,  x])  on  [0,oo]. 


Then,  given  ERV  functions  a  >  0,  ft  G  M  on  (0, 00),  if  f  is  a  monotone  function  defined  on  (0, 00) 
satisfying  (4.11),  the  transition  function  Kf  :  (0,  00)  x  B[— 00,00]  —>  [0,1]  given  by 

Kf(y,A)  =  K(y,  f~(A)) 


satisfies 

where 


Kf(t,  [-00,  a(t)x  +  fift)])  =>  G(Aip<-(x))  =:  Gf([— 00, x])  on  /([0,  00]), 

^  I  [0,(/9<_(x)]  /  non- decreasing 

|  [p^(x),  oo]  f  non-increasing 


Proof.  Assume  (4.1)  first,  and  let  /  be  a  non-decreasing  standardization  function  satisfying  (4.11). 
Then, 

Kf(t,t[0,x])  =K(t ,  [-00 ,f(tx)]) 

f(tx)  -  fi(t)' 


=  K  t ,  aft) 


—  00, 


aft) 


+  fift)  =>•  G([-oo,¥?(x)]). 


On  the  other  hand,  if  /  satisfies  (4.11)  for  a,fi€.  ERV,  then  inverting  this  relation  yields 

f*~ (a(t)x  +  fi(t))  . 

\ - x  €  /((0, 00)). 

Consequently,  for  non-decreasing  /, 

Kfft,  [-oo,a(t)x  +  fift)])  =  Kft,  t[t),t~l f^~(a(t)x  +  fift))})  G([0,  ip^(xj\). 
The  case  for  non-increasing  /  is  similar. 


□ 


Proposition  4.3  rephrases  Das’s  result  [3,  Proposition  2.3.3]  on  the  existence  of  standardization 
functions  in  terms  of  conditional  distributions  rather  than  joint  distributions,  Indeed,  suppose  K 
is  a  version  of  the  conditional  distribution  P[X  G  ■  \  Y  =  y],  satisfying  (4.1),  where  a,  fi  €  ERVPifc 
with  (p,  k)  fi  (0, 0).  If  Fy  is  in  the  standardized  domain  of  attraction,  then  (A,  Y)  follows  a  CEVM 
by  Theorem  4.1.  Furthermore,  (X,Y)  can  be  standardized  in  the  sense  of  (4.10)  [3,  Proposition 
2.3.3  (1)],  and  the  standardization  function  /  satisfies  (4.11)  by  Proposition  4.2.  This  standard 
CEVM  could  equally  have  been  obtained  by  applying  Theorem  3.1  to  the  transition  function  Kf, 
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a  version  of  the  conditional  distribution  P[/<_(X)  €  •  |  Y  =  y],  giving  (4.10)  directly.  That  (X,Y) 
follow  a  (non-standard)  CEVM  would  then  follow  by  unstandardizing  the  limit  measure  fi* . 

In  Section  3.1  we  discussed  the  properties  of  the  limit  measure  in  the  standard  case.  In 
particular,  if  X  belongs  to  the  standardized  domain  of  attraction,  then  G  necessarily  satisfies  the 
moment  condition  0  <  E£  <  1  (recall  £  ~  G ).  This  is  due  to  the  fact  that  /x((l,oo]  x  (0,  oo])  is 
bounded  by  x~l .  Using  the  standardization  approach  discussed  above,  we  can  derive  necessary 
conditions  on  G  when  X  belongs  to  a  general  domain  of  attraction. 

Suppose  there  exist  normalizing  functions  c(t)  >  0  and  d(t)  such  that 


(4.16) 


tP 


X  -  d(t ) 
c{t) 


>  X 


(1  +  Ax)_1/a 


x  <E  Ex, 


implying  that  c,d£  ERVa,i  (see  Section  2.2,  p.  3).  Das  and  Resnick  [4,  Proposition  4.1]  show  that 
if  (. X ,  Y )  follow  a  CEVM  and  (4.16)  holds,  then  the  vector  ( X ,  Y)  belongs  to  a  multivariate  domain 
of  attraction  provided  lim^oo  a(t)/c(t)  G  [0,  oo). 

For  simplicity,  we  consider  the  case  where  the  conditional  distribution  of  X  given  Y  satisfies 


K(t,  [— oo,  c(t)x  +  d(t)])  =>  G([— oo,  a:]), 


i.e.,  (4.1)  holds  under  the  same  normalization  as  in  (4.16).  Then  d  is  a  standardization  function 
satisfying  (4.11),  and 


(p(x) 


X~1(xx  -  1)  A/0 

log  x  A  =  0 


(see  (2.7),  p.  3).  Theorem  3.1  gives  a  standard  CEVM  for  (d<~(X),Y),  and  furthermore,  d*~(X) 
belongs  to  the  standardized  domain  of  attraction.  Therefore,  the  distribution  G  must  satisfy 


Depending  on  A,  this  reduces  to 


P[£  >  tp(x)]dx  <  1. 


'E£1/a1{c>0}  <  A”1  A  A  >  0 

<  E(— 1/£)1/IaI  1{£<o}  <  |A|VIAI  A  <  0  . 
E  e?  <  1  A  =  0 


Thus,  we  obtain  a  different  condition  for  each  class  of  extreme  value  distribution.  In  the  Frechet 
case,  we  have  a  bound  on  the  1/A-th  moment  of  the  right  tail.  If  the  domain  of  attraction  is 
Weibull,  this  becomes  an  integrability  condition  near  0.  Finally,  in  the  Gumbel  case,  the  right  tail 
of  £  is  exponentially  bounded,  so  all  right-tail  moments  exist. 


4.3.  Relation  to  the  Heffernan  and  Tawn  Model.  The  CEVM  of  Theorem  4.1  is  inspired  by 
the  statistical  model  proposed  by  Heffernan  and  Tawn  [10].  We  now  discuss  some  links  between 
this  work  and  the  CEVM. 

Where  Heffernan  and  Tawn’s  model  is  based  on  the  convergence  of  conditional  distributions,  as 
in  (4.1),  the  CEVM  focuses  on  limits  of  joint  distributions.  Theorem  4.1  shows  that  Heffernan  and 
Tawn’s  assumption  [10,  Equation  (3.1)]  leads  to  a  CEVM  provided  the  normalization  functions  a 
and  f3  are  ERV.  The  fact  that  the  convergence  (4.1)  is  required  to  hold  at  all  points  x  suggests 
that  they  are  expecting  a  continuous  limit.  Instead,  we  have  framed  the  assumption  in  the  more 
theoretically  appealing  context  of  weak  convergence.  Also,  Heffernan  and  Tawn  standardize  the 
conditioning  variable  to  a  Gumbel  domain  of  attraction  rather  than  Frechet,  which  is  our  condition 
(3.1),  but  this  is  a  minor  point. 

Also,  Example  3.7  demonstrates  a  theoretical  disadvantage  to  working  with  conditional  distri¬ 
butions.  A  condition  such  as  (4.1)  is  tacitly,  if  not  explicitly,  assuming  a  particular  version  of  the 
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conditional  distribution.  This  issue  cannot  be  ignored,  since  Example  3.7  shows  that  (4.1)  holding 
for  one  particular  version  does  not  imply  that  it  holds  for  every  version.  This  question  of  version 
is  not  addressed  by  Heffernan  and  Tawn.  However,  it  should  not  pose  a  problem  if  we  assume  that 
the  distributions  are  absolutely  continuous,  as  is  common  in  statistical  contexts. 

Another  interesting  point  concerns  the  normalization  functions  a  and  (3.  If  a  non-degenerate 
CEVM  holds,  then  these  are  necessarily  ERV.  It  is  not  clear  whether  Heffernan  and  Tawn  recognized 
this  as  a  theoretical  result,  but  they  do  assume  a  parametric  form  for  these  functions  which  is  very 
similar  to  ERV.  They  specify 

ot(y)  =  b\i(y)  :=  yb ^  =  yp 

for  some  constant  p  <  1  and 


P{y)  =  G|  i(y) 


ay  0  <  p  <  1,  with  a  £  [0, 1] 

c  —  dlogy  p  <  0  with  a  =  0,  c  £  M,  d  £  [0, 1] 


Although  more  general  models  are  possible,  the  form  of  the  ERV  limit  function  if}  in  (2.2)  (p.  2) 
suggests  that  a  parametric  approach  is  indeed  reasonable. 


5.  General  Normalizations  for  both  X  and  Y 

Up  until  now,  we  have  been  assuming  that  Y  belongs  to  the  standardized  Frechet  domain  of 
attraction:  t  P[V  >  ty]  — >  y~l  for  y  >  0.  We  wish  to  extend  the  result  of  Theorem  4.1  to  the  case 
where  Y  belongs  to  a  general  domain  of  attraction: 

(5.1)  tP[Y  >  a(t)y  +  b(t)] — >  (1  +  yy)~lh  y  €  E7, 

where  E7  :=  {y  :  1  +  *yy  >  0}.  See  Section  2.2  (p.  3)  for  further  details  on  domains  of  attraction. 
We  will  assume  that  b(t)  is  given  by  (2.6). 

An  important  consideration  in  the  previous  development  is  that  convergence  results  depend  on 
properties  of  the  particular  choice  of  version  K (y,  •)  of  the  conditional  distribution  P[X  £  •  |  Y  =  y\. 
Because  Y  is  now  normalized  according  to  a  and  b,  the  condition  (4.1)  may  no  longer  be  sufficient 
to  obtain  a  general  CEVM  limit,  as  in  (2.8)  (p.  3). 

If  it  were  known  that  K(a(t)u  +  b(t) ,  [— oo,  a(t)x  +  /3 (t)])  -A  <px(u)  for  u  >  0,  then  (2.8)  should 
follow  from  arguments  similar  to  those  in  Section  4.1.  On  the  other  hand,  Heffernan  and  Resnick 
[9]  argue  that  (2.8)  reduces  to  (4.6)  by  standardizing  Y  using  the  transformation  Y  i-a  b<~(Y). 
Hence,  if  K*,  a  specific  version  of  P[X  £  •  |  b^(Y)  =  y],  satisfies  (4.1),  then  (X,b<~(Y))  follows  a 
CEVM  under  approriate  normalization  of  X  by  Theorem  4.1,  and  (2.8)  should  follow  from  (4.6) 
by  untransforming.  We  now  examine  the  consistency  of  these  two  approaches. 


5.1.  Kernel  Asymptotics.  The  transition  function  K  :  (—00,00)  x  B[— 00,00]  -A  [0,1]  will  con¬ 
tinue  to  denote  a  specific  version  of  the  conditional  distribution  of  X  given  Y ,  i.e. ,  , 


K(y,  •)  =  P[x  e-\Y  =  y]. 


Moving  towards  the  transformation  approach  described  above,  we  first  argue  that  we  can  express 
a  version  of  the  conditional  distribution  of  X  given  b^(Y)  in  terms  of  K . 

First,  recall  that  the  convergence  (5.1),  where  b  is  given  by  (2.6),  implies  that  a,  b  £  ERV7.i  . 
Hence,  a  £  RV7,  and 


b(tx)  —  b(t) 
a(t ) 


x1  —  1 


(5.2) 


7 

log  a; 


7/0 
7  =  0 


x  >  0. 
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Inverting  (5.2)  gives 


(5.3) 


b^{aff)x  +  bft )) 


(1  +  72;)1/7  7/O 

ex  7  =  0 


iGE, 


'7' 


Furthermore,  recall  that  if  b  is  any  function  on  (0, 00)  satisfying 

b(t)  -  bft) 


(5.4) 


a{t) 


0 


as  t  — >  00, 


then  (5.1),  (5.2),  and  (5.3)  hold  with  b  replaced  by  b.  We  now  verify  that  we  can  choose  such  a  b 
which  is  invertible. 

Lemma  5.1.  There  exists  a  function  b*  satisfying  (5.4)  that  is  continuous  and  strictly  monotone. 

Proof.  We  consider  cases  on  7.  If  7  =  0,  then  b  E  11(a).  Then  we  can  find  b  continuous,  strictly 
increasing  such  that  ( b(t )  —  b{t))/a(t )  — >  1  by  [12,  Proposition  0.16].  The  choice  b*(x)  =  b(e~1x) 
satisfies  (5.4).  Otherwise,  suppose  7  >  0.  Then  b  E  RV7,  and  b(t)/a(t.)  -A  7-1  [6,  Theorem  B.2.2 
(1)].  Consequently,  [13,  Proposition  2.6  (vii)]  gives  a  continuous,  strictly  increasing  function  b*  ~  b. 
Writing 


b*(t.)  —  b(t)  b(t.) 


b*(t) 

b(t) 


-  1 


aft)  aft) 

shows  that  b*  satisfies  (5.4).  Finally,  if  7  <  0,  then  6(00)  =  lirn^oo  b(t)  exists  finite,  b(oo)  —  b  E  RV7, 
and  (6(00)  —  bft)) /aft)  -A  — 7_1.  Choose  b  continuous,  strictly  decreasing,  with  b  ~  (&( 00)  —  b ),  and 
set  b*  =  6(00)  —  b.  □ 

Henceforth,  b*  will  denote  a  continuous,  strictly  monotone  function  satisfying  (5.4).  The  advan¬ 
tage  to  working  with  b*  is  that  b*^(b*(x))  =  b*(b*^(x))  =  x.  By  (5.2),  Y*  =  b*^(Y)  belongs  to 
the  standard  domain  of  attraction  when  (5.1)  holds: 


tP[Y*  >ty\  =  t  P 


Y  —  b*(t)  b*{ty)  —  b*(t) 


y 


-1 


a(t)  aft ) 

We  argue  that  when  K(y,-)  =  P[X  E  •  |  Y  =  y\,  the  transition  function 
(5.5)  K*(y,-):=K(b*(y),-) 

is  a  version  of  the  conditional  distribution  P  [X  E  •  |  Y*  =  y\. 
Proposition  5.1.  For  measurable  A  and  y  >  0,  we  have 


y>  0. 


(5.6) 

Proof.  Write 


P[X  &A,Y*>y}  = 


'(y,oo) 


K(b*{u) ,  A)  P[Y*  E  du\. 


P[X  E  >  y]  =  P[X  Ei,F>  b*fy)\  =  /  K(u,  A)  P [Y  E  du] 

J  (b*(v).oo) 


00) 

=  [  K(b*{b*^{u)) ,  A)  P [Y  E  du], 

using  the  fact  that  b* (b*^ (u))  =  u  for  all  u,  and  change  variables  according  to  the  transformation 
T  =  b*^ .  Since  T_1(y,  00)  =  {x  :  b*^(x)  >  y}  =  {b*fy),  00),  the  result  follows.  □ 
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Note  that  (5.6)  is  not  necessarily  true  for  the  function  b  given  by  (2.6),  unless  P[6(0“(V))  0 
Y]  =  0. 

Next,  we  show  that  the  two  approaches  to  the  CEVM  discussed  at  the  beginning  of  Section  5, 
the  direct  approach  and  the  standardization  approach,  are  indeed  consistent.  That  K*  converges  to 
a  family  of  distributions  under  scaling  of  the  initial  state,  in  the  sense  of  (4.4)  (p.  9),  is  equivalent 
to  the  same  for  K  with  initial  state  normalized  by  a  and  b. 


Proposition  5.2.  Suppose  Y  is  a  random  variable  with  distribution  satisfying  (5.1),  and  let  K* 
be  given  by  (5.5).  Given  normalization  functions  aft)  >  0  and  /3(f)  £  M,  there  exists  a  transition 
function  4>*  :  (0,  oo)  x  £>[— oo,  oo]  -A  [0, 1]  such  that,  as  t.  -A  oo, 

(5.7)  K* (tut ,  [— oo ,a(t)x  +  (3(t)])  =>(j)*(u,  [— oo,x])  on  [—00,00] 

whenever  ut  -A  it  £  ( 0,  00),  if  and  only  if  there  exists  a  transition  function  (f  :  E7  x  B[— 00, 00]  -A  [0, 1]| 
such  that,  as  t  — >  00, 

(5.8)  K (a(t)ut  +  6(f) ,  [— 00,  a{t)x  +  /3(f)])  =7(/>(it,  [— 00,  x])  on  [—00,00] 
whenever  17  — >  u  £  IE7.  If  these  convergences  hold,  then 

(i)  a,  /3  £  ERV; 

(ii)  <f*  =  kg* ,  a  generalized  tail  kernel  (4.3)  with  G*  =  </>*(l,  •); 

(iii)  (j){u ,  A)  =  kg((1  +  7'u)1'/’7 ,  A),  where  kq  is  a  generalized  tail  kernel  with  G  =  0(0,  •);  and 

(iv)  the  two  transition  functions  are  related  by  G  =  G* . 


Proof.  Abbreviate  at  =  a(t)  and  bt.  =  b(t).  The  convergences  (5.2)  and  (5.3)  are  in  fact  locally 
uniform  on  (0,  00)  (see  Section  2.1,  p.  2).  Since  b*  satisfies  (5.4),  it  follows  that 


b*(tut) 

at 


whenever  ut  -A  u  £  (0, 00), 


and 


b*^(atut  +  bt) 
t 


(1  +  7  u)1^1 


whenever  ut  -A  u  £  E7. 


Assuming  (5.7),  for  ut  u  £  E,,  we  have 

K[a(t)ut  +  b(t) ,  [-00,  a(t)x  +  /3(t)]) 

=  K  (b*  (t,{t~lb*^  (atut  +  bt)})  ,  [-00,  a(t)x  +  /?(£)]) 
=  K*  (t{t~lb*^ (atut  +  bt )}  ,  [-00,  a(t)x  +  /3(t)]) 
=></>*((  1  +  7 a)lh  ,  [-00,  x])  =:  (j)(u  ,  [-00,  x]) 


Conversely,  if  (5.8)  holds,  then  for  ut  — >  u  >  0, 

K*(tut,  [-00,  a(t)x  +  /3(t)]) 

=  K(at  ■  afl{b*(tut)  -  bt)  +  bt ,  [-00 ,a(t)x  +  (3(t)\) 
=7-  0(7_1(w7  -  1) ,  [-00,  x])  =:  cf*(u,  [-00, x)) 


In  either  case,  G  :=  0(0,  •)  =  0*  ( 1 ,  * )  =:  G* .  Proposition  4.1  shows  that  a  and  (3  are  ERV  and 
f>*  =  kq*-  Consequently,  <f{u  ,  •)  =  «g((1  +  7 rt)1/7  ,  •).  □ 


Therefore,  by  Proposition  4.1  (p.  9),  if  there  exists  a  non-degenerate  distribution  G  on  [—00,00) 
such  that 


(5.9)  K*  (t ,  [—00 ,a(t)x  +  j3(t)])  =K(b*(t)  ,  [— 00,  a(t)x  +  (3(t)])  G([— 00,  x\) 

with  a,  j3  £  ERV,  then  (5.8)  holds. 
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How  can  we  apply  Proposition  5.2  starting  from  an  assumption  like  (5.9)  on  the  kernel  K  rather 
than  K* ?  Because  b*(b*^(t))  =  t ,  (5.9)  can  be  written  as 

K (t ,  [— oo,  a  o  b*^(t)x  +  (5  o  6*^~(t)])  =4>  G([— oo,  x])  as  t  -A  y* , 

where  y*  denotes  the  upper  endpoint  of  the  distribution  of  Y.  written  as  y*  =  sup{y  :  Fyfy)  <  1}. 
Therefore,  we  require  there  to  exist  a  non-degenerate  distribution  G  and  normalization  functions 
a  >  0  and  /3  such  that 

(5.10)  K(t,  [— oo,  a{t)x  +  j3(t,)])  =4>  G([— oo, x\)  as  t  ->  y* , 

and  a  =  aob*,/3  =  /3ob*  G  ERV. 


5.2.  CEVM  Properties.  Using  the  standardization  approach  discussed  in  the  previous  section, 
we  obtain  a  CEVM  when  Y  belongs  to  a  general  domain  of  attraction. 


Theorem  5.1.  Suppose  (X,  Y )  is  a  random  vector  on  M2,  where  Fy  G  D{G1)  (5.1),  and  Kfy,  •)  = 
P[I  G  •  Y  =  y\  converges  according  to  (5.10),  for  some  normalizing  functions  a  >  0  and  /3  G  M 
and  non- degenerate  limit  distribution  G  on  [—00,00).  Let  b*  be  the  function  satisfying  (5.4)  given 
by  Lemma  5.1,  and  put  a  =  aob*,/3  =  (3ob*.  Then,  as  t  -A  00, 


(5.11) 


(x-m  Y~b(t)\ 

v  a(t)  ’  a{t)  ) 


in  M+([— 00,  00]  x  E7), 


where  y  is  a  non-null  Radon  measure  satisfying  the  conditional  non- degeneracy  conditions  (2.9),  if 
and  only  if  a,  (3  G  ERVPifc.  In  this  case,  the  limit  measure  y  is  specified  by 

/*(l+72/V1/7 

(5.12)  y([— 00,  x\  X  (y,  00])  =  /  du  P  <  upx  +  ip(u)] ,  x  G  M,  y  G  E7, 

Jo 

with  if  as  in  (2.2)  (p.  2).  The  expression  (5.12)  is  continuous  in  x  and  y  if  (p,k)  (0,0). 


Proof.  First,  observe  that  Y*  =  b*^(Y)  G  D(G\).  Defining  the  transition  function  K*(y,-)  = 
P[X  G  •  |  Y*  =  y\  as  in  (5.5),  our  hypotheses  imply  (5.9).  Therefore,  if  a,  j3  G  ERVp  fe,  then  by 
Theorem  4.1,  we  have 

rx-p{t)  y* 


t  P 


aft)  ’  t 


y*(-)  in  M_|_([— 00,00]  x  (0, 00]), 


where  y*  is  defined  by 

rv~x 

//([— 00,  x]  x  (y,  00])  =  /  du  P  <  upx  +  if{u)\ ,  x  G  M,  y  >  0, 

Jo 


conditionally  non-degenerate.  Consequently,  for  x  G  M  and  y  G  E7, 
T  IT  -  R(t\  V  -  hit.)  1 

tP 


x  -  pot )  y  -  bit ) 

<  X, - >  y 


aft ) 


=  t  P 


a(t) 

X-m  ^  Y*  ^  b*<-(a(t)y  +  bft)) 


aft ) 
(1+72/)- 1/7 


<  x,  —  > 


t 


duP  [f  <  upx  +  ^(it)]  =  /i([— 00,  x]  X  (y,  00]), 

70 

and  the  marginal  transformation  of  Y  does  not  affect  conditional  non-degeneracy  or  continuity. 
Conversely,  (5.11)  implies  that  a,/3  G  ERV  [9,  Proposition  1],  □ 


Remark.  Instead  of  standardizing  Y,  we  could  equally  have  used  the  convergence  (5.8),  which 
holds  under  our  assumptions  by  Propositions  4.1  and  5.2. 
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Recalling  the  forms  of  the  limit  measure  given  in  Section  4.1,  we  can  express  the  limit  measure 
in  (5.12)  as 


fj,([-oo,x]  x  (y,oo])  = 

\  Ax+kp~1\(l+')y)~Ph 


p\x  +  kp-1\1/p  Jo 

l  /■xsgn(fc)-|fc|7“1  log(l+7J/) 


t(i  p)/p  p  [£  <  usgn(x  +  kp  x)  —  kp  x]  du 

P  +  o 


\k\ex/k  Loo 

{ (1  +  7 y)~lh  p[£  <  x] 


eu/\k\  p  jjt  <  u sgn(fe)]  du 


p  =  0,  k  /  0 
p  =  0,  k  =  0 


where  sgn(u)  =  v/\v\  l{w0p  an(l we  read  the  measure  as  (l+yy)-1/7  P[£  <  —kp^1]  when  x  =  —kp^1 
for  the  case  p  /  0. 

In  Example  4.1  (p.  11),  we  presented  a  transition  function  satisfying  (4.1)  which  did  not  lead 
to  a  CEVM  when  paired  with  Y  £  D(G\).  We  now  show  that  a  non-degenerate  CEVM  may  be 
obtained  if  Y  belongs  to  a  non-standardized  domain  of  attraction. 


Example  5.1.  Consider  Y  ~  Exp(l),  and  U  ~  Uniform(0, 1),  independent  of  Y .  Put  X  =  UeY . 
Note  that  Y  £  D(G'o)  with  a(t )  =  1,  b(t)  =  logt,  since  for  y  £  M, 

tP(Y  >  y  +  logt)  =  te~y~Xogt  =  e~y. 


A  version  of  the  conditional  distribution  is  given  by 


K(y ,  [0,  x])  =  P[X  <x\Y  =  y\  =  P[U  <  xe~y ]  =  xe~y  A  1. 
Taking  a(t)  =  e*,  we  saw  in  Example  4.1  that 


K(t ,  d(t)[0,  i])7iAl  =  G([0,  x]), 


although  a  is  not  regularly  varying.  Since  b  is  continuous  and  strictly  monotone,  set  a(t)  = 
a(b(t))  =  t.  Then 

K*(t,  l[0,x])  =  K(b(t) ,  a{b(t))[ 0,x])  =*►  G([0,x]), 

and  a(t )  £  RVi.  Hence,  K*(tu ,  a(f)[0,  x])  =7  xu~x  A  1  =  G(u_1[0,x]),  and  K*(y ,  •)  =  P[A  £ 
•  |  eY  =  y\.  On  the  other  hand,  note  that  for  u  £  M, 

K{a{t)u  +  b(t) ,  a(f)[0,x])  =  txe~u~logt  A  1  =  xe_“  A  1  =  G((e“)_1[0,  x]). 

This  illustrates  the  equivalence  presented  in  Proposition  5.2  (p.  17).  Now,  for  x  >  0,  y  >  0,  the 
joint  distribution  is  given  by 


poo  P  logo? 

P [X  <  x,Y  >  y\  =  /  xe~2udu  +  /  e~udu  l{2/<ioga;}  • 

J\09LX\/V  J  V 


'log  x\/y 

Therefore,  for  x  >  0,  y  £  M,  and  large  t,  we  have 


tP[X  <tx,Y  >  y  +  log  t\  =  < 


xe 


-2  y 


if  log  x  <  y 


e  y - if  log  x  >  y 

2x 


=  p([0,x\  x  (y,  00]), 


and  (X,Y)  follow  a  CEVM  by  Theorem  5.1. 
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6.  Conclusions  and  Future  Directions 

Although  dealing  with  conditional  distributions  in  general  raises  certain  issues  surrounding  iden- 
tifiability,  in  many  statistical  contexts  a  conditional  formulation  such  as  (4.1)  is  convenient.  For 
example,  it  may  be  appropriate  to  model  X  as  an  explicit  function  of  Y .  Also,  if  we  are  working 
with  distributions  that  have  continuous  densities,  the  natural  choice  of  version  of  the  conditional 
distribution  is  the  absolutely  continuous  one,  and  other  simplifications  may  be  afforded. 

The  above  development  suggests  that  in  such  cases,  the  approach  of  Heffernan  and  Tawn  [10]  is 
reasonable,  and  will  generally  lead  to  a  fairly  parsimonious  extremal  model  which  can  account  for 
varying  degrees  of  asymptotic  independence.  Heffernan  and  Tawn  propose  a  semiparametric  model, 
where  the  limit  distribution  G  is  estimated  nonparametrically,  and  the  normalization  functions  a 
and  (3  belong  to  a  parametric  family.  The  extended  regular  variation  of  a  and  f3  provides  some 
justification  for  this  last  assumption.  Furthermore,  the  formulas  for  the  limit  measure  derived  above 
show  that  by  modeling  conditional  distributions,  we  obtain  a  simpler  CEV  model  parametrized  by 
the  distribution  G  and  the  pair  ( p ,  k ),  along  with  7,  the  extreme  value  index  of  Y . 

The  question  of  fitting  a  bivariate  CEV  model  has  been  considered  by  Das  and  Resnick  [5]  and 
by  Fougeres  and  Soulier  [8].  These  authors  discuss  statistics  for  detecting  a  CEV  model  and  esti¬ 
mating  the  normalizing  functions.  However,  many  open  questions  remain,  such  as  the  asymptotic 
distributions  of  such  estimators,  and  the  appropriate  method  for  nonparametric  estimation  of  G. 
These  problems  may  presumably  be  simplified  substantially  through  the  use  of  standardization  for 
both  X  and  Y. 

Also,  a  natural  extension  of  the  bivariate  model  discussed  above  would  be  to  consider  the  con¬ 
ditional  formulation  for  higher-dimensional  vectors.  Indeed,  this  was  the  original  intention  of 
Heffernan  and  Tawn,  who  apply  their  methodology  to  a  five-dimensional  air  pollution  dataset.  It 
is  not  clear  what  would  be  the  appropriate  formulation  of  a  model  conditioning  on  more  than  one 
extreme  variable,  nor  the  connections  between  such  a  model  and  the  usual  multivariate  domain  of 
attraction.  In  particular,  cases  where  asymptotic  independence  is  present  between  some  pairs  of 
variables  but  not  others  would  require  careful  treatment.  However,  a  model  based  on  conditional 
distributions  as  developed  above  should  prove  useful. 
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